Sequential Pattern Classification without Explicit Feature Extraction

نویسنده

  • Hansheng Lei
چکیده

Feature selection, representation and extraction are integral to statistical pattern recognition systems. Usually features are represented as vectors that capture expert knowledge of measurable discriminative properties of the classes to be distinguished. The feature selection process entails manual expert involvement and repeated experiments. Automatic feature selection is necessary when (i) expert knowledge is unavailable, (ii) distinguishing features among classes cannot be quantified, or (iii) when a fixed length feature description cannot faithfully reflect all possible variations of the classes as in the case of sequential patterns (e.g. time series data). Automatic feature selection and extraction are also useful when developing pattern recognition systems that are scalable across new sets of classes. For example, an OCR designed with explicit feature selection process for the alphabet of one language usually does not scale to an alphabet of another language. One approach to avoiding explicit feature selection is to use a (dis)similarity representation instead of a feature vector representation. The training set is represented by a similarity matrix and new objects are classified based on their similarity with samples in the training set. A suitable similarity measure can also be used to increase the classification efficiency of traditional classifiers such as Support Vector Machines (SVMs). In this thesis we establish new techniques for sequential pattern recognition without explicit feature extraction for applications where: (i) a robust similarity measure exists to distinguish classes and (ii) the classifier (such as SVM) utilizes a similarity measure for both training and evaluation. We investigate the use of similarity measures for applications such as on-line signature verification and on-line handwriting recognition. Paucity of training samples can render the traditional training methods ineffective as in the case of on-line signatures where the number of training samples is rarely greater than 10. We present a new regression measure (ER2) that can classify multi-dimensional sequential patterns without the need for training with large number of prototypes. We use ER2 as a preprocessing filter in cases when sufficient training prototypes are available in order to speedup the SVM evaluation. We demonstrate the efficacy of a two stage recognition system by using Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) in the supervised classification framework of SVM. We present experiments with off-line digit images where the pixels are simply ordered in a predetermined manner to simulate sequential patterns. The Generalized Regression Model (GRM) is described to deal with the unsupervised classification (clustering) of sequential patterns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Parametric and Non-parametric EEG Feature Extraction Methods in Detection of Pediatric Migraine without Aura

Background: Migraine headache without aura is the most common type of migraine especially among pediatric patients. It has always been a great challenge of migraine diagnosis using quantitative electroencephalography measurements through feature classification. It has been proven that different feature extraction and classification methods vary in terms of performance regarding detection and di...

متن کامل

کاهش ابعاد داده‌های ابرطیفی به منظور افزایش جدایی‌پذیری کلاس‌ها و حفظ ساختار داده

Hyperspectral imaging with gathering hundreds spectral bands from the surface of the Earth allows us to separate materials with similar spectrum. Hyperspectral images can be used in many applications such as land chemical and physical parameter estimation, classification, target detection, unmixing, and so on. Among these applications, classification is especially interested. A hyperspectral im...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

A Real-Time Electroencephalography Classification in Emotion Assessment Based on Synthetic Statistical-Frequency Feature Extraction and Feature Selection

Purpose: To assess three main emotions (happy, sad and calm) by various classifiers, using appropriate feature extraction and feature selection. Materials and Methods: In this study a combination of Power Spectral Density and a series of statistical features are proposed as statistical-frequency features. Next, a feature selection method from pattern recognition (PR) Tools is presented to e...

متن کامل

PCA, SFS or LDA: What is the Best Choice for Extracting Speaker Features?

Feature extraction is the process of deriving new weakly correlated features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allows higher classification accuracy. The selection and quality of the features representing each pattern have considerable bearing on the success of subsequent pattern classification. In this paper, we s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005